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Tho present invention relates to microprocessor 
cache subsystems in computer systems, and moro 
specifically to a method for achieving multilevel inclu- 
sion among first level and second level caches in a 
computer system so that the second IgvgJ cache con- 
troller can perform the principal snooping responsibi- 
lities for both caches. 

The personal computer industry Is a vibrant and 
growing fidd that continues to evolve as now inno- 
vations occur. The driving force behind this innovation 
has been the increasing demand for faster and mora 
powerful computers, a major botSenecfc in personal 
computer spead has historically been tho speed wfih 
which data can be accessed from memory, referred to 
as the memory access timo, Tho microprocessor, with 
its relatively fast processor cydo times, has gonoralfy 
been delayed by the uso of wait states during memory 
accesses to account for tho relatively slow memory 
access times. Thereforo, Improvement in memory 
- access times has been one of tho major areas of 
researtfi in enhancing computer performance 

In order to bridge the gap between test processor 
cydo tones and slow memory acceso tfmoo, cacho 
memory was developed. A cacho is o emsfl amount of 
vejy fast, and expensive, zero wait state memory that 
is used to store a copy of frequently accessed ccdo 
and data from main memory, Tho mteoprocosecr can 
operate out of. this very feot memory and thereby 
reduce tho . number of wait states that must bo inter- 
poood during memory accesses. When tho processor 
requests data from memory and the data resides in 
tho cache, then a cache road hit testes placo, and the 
data from tho memory aocesa can bo returned to tho 
processor from tho cscfco without incurring *yalt 
statos. if the data io no£ in tho ^tcho. then a cacho 
read msa takes place, and the memory request is for- 
warded to the system and tho data Is retrieved from 
main memory, as woi^d normally be done if the cache 
did not exist On a cache miss, tho data that is rep- 
rieved from memory is provided to tho processor and 
is also written into tho cache due to the statistical 
likelihood that this date will be requested again by the 
. processor. - ; 

An efficient cache yields a high "hit rate 0 , which 
is the percentage of cache hits that occur during all 
memory accesses. When a cacho has a high hit rate, 
the majority of memory accesses are serviced with 
zero wait states. The nat effect of a high cache hit rate 
is that the wait states incurred on a relatively infre- 
quent miss are averaged over a large number of zero 
wait state cache hit accesses, resulting in an average 
of nearly zero wait states per access. Also, since a 
cache s usually located on the local bus of the micro- 
processor, cache hits are serviced locally without 
requiring use of the system bus. Therefore, a pro- 
cessor operating cut of its iocai cache has a much 
lower 'bus utilization. 0 This reduces system bus 
bandwidth used by the processor, making moro 


bandwidth available for other bus masters. 

Another impor&rrt feature of caches is that the 
processor can operate qut of its local cache when it 
does not havo control of the system bus, thereby 
5 increasing the efficiency of the computer system. In 
■ systems without microprocessor caches, the pro- 
cessor generally must remain, idle whBe it does not 
havo control of the system bus. This reduces the over- 
all efficiency of the computer system because the pro- 
10 cesser cannot do any useful work at this time. 
However, If the processor Includes a cache placed on 
Its local bus. It can retrieve the necessary code, and 
data from f& cacho to perform useful work while other 
devtaos have control of tho system bus, thereby 
18 increasing system efficiency. 

Cache performance is dependent on many fac- 
tor, including tho hit. rate and the cache memory 
access frno. The hit rate is @ measure of how efficient 
a cache is in maintaining s copy of the most frequently 
to used cede and data and, to a large extent it is a func- 
tion of the steo of tho cache. A larger cache will gorv 
eralfy havo a higher Wfc rato than a cmallcr cacho. 
Incroooing tho oteo of tho cacho, however, can poss- 
ibly dogrado tho cacho momory accoso timo. How*. 
23 over, cacho doolgno for o Icrger cacho can be 
achieved using cacho memory with tho fastest poss- 
iblo access times such that the limiting factor in the 
design b tho minimum CPU access time, in this way, 
a large? cache would not bo penalized by a possibly 
30 slower cache memory access time with respect to the 
memory access time of a smaller cache because the 
limiting factor in the design would be the minimum 
CPU access tima. 

Other important considerations in cache perform- 
J9 ancQ ©ro tho organization of the cache and the cache 
management policies that are employed in the cache. 
A cache can generally bo organized into either a di- 
rect-mapped or set-associative configuration. In a di- 
rect-mapped organization, tho physio! address 
4C space of the computer is conceptually divided up into 
a number of equal pages, with tho page size equaling 
the size of tho cache. The cache Is divided up into a 
number of sets, with each set having a certain number 
of lines. Each of tho pages in main memory has a 
48 number of lines equivalent to the number of lines in 
the cache, and each line from a respective page in 
main memory corresponds to a similarly located line 
in the cache. An important characteristic of a direct- 
napped cache is that each memory line from a page 
so in main memory, referred to as a page offset can only 
reside in the equivaJently located line or page offset 
in the cache. Due to this restriction, the cache oniy 
jieed refer to a certain number of the upper address 
bite of a memory address, referred to as a tag, to 
53 determine if a copy of the data from the respective 
momory address resides in the cache because :he 
lower order address bits are pr&-detenmined by the 
page offset of tho memory address. 
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Whereas a dirccfc-mapped cache is organized as 
one bank of memory mat is equivalent in size to a con- 
ceptual page in main memory, a set-associative 
cache includes a number of banks, orwayo, of mem. 
cry that are each equivalent In sfeo to a conceptual s 
page In main memory. Accordingly, a page offset in 
main memory can be mapped to a number of locations 
In the cache equal to the number of ways In the cache. 
For example, In a 4-way sot associative cache, a lino 
or page of&et from main memory can resido in tfto to 
equivalent page offeet location In any of the four ways 
of the cache. 

A set-associative cache generally includes a rep- 
lacement algorithm that determines which bank, cr 
way, with which to fill data when a road miao occuro. 13 
Many set-associative caches use some form of© lesat 
recently used (LRU) algorithm that places new data in 
the way that wqq least recently accessed. Thb b 
because, statistically, the way most recently used a? 
accessed to provide data to tho processor I© fro ono so 
most likely to be needed again in the fafeif©. Thero^ 
ore, the LRU g^gorfthm ensures that tho block whtaft 
is replaced is tho least l&eiy to hovo data requested 
by the cacha 

Cscho management lo generally performed by a 20 
dovico referred to so a cacho eonfrollcr. Tho cacho 
contrellor inciudoo a directory the* hoido cm 
associated entry for each so* In tho cacho. Thb cnfcy 
genorstty has throe components: a tsg, a teg valid bft 
and a number of lino valid bits equaling tho number of 30 
lines in eech cache sot Tho teg acta ao a main mom- 
ory page number, and it hdda tho uppor addroso bOo 
of the particular page in main memory from wheeh tho 
copy of data residing in the i respective sot of tho cache 
originated. The status of tfto teg valtd M deterniineo 33 
whether the data in the respective sa of tho cacho io 
. considered valid or invalid. If tto teg valid bit la cleor, 
then the entire set is considered Invalid, if the teg vaitd 
bit is true, then an individual lino wfthfn the set is con- 
sidered valid or invalid depending on tho status of to m 
respective line valid bit 

A principal cache management policy Is the pre- 
servation of cacho coherency. Ccehe coherency 
era to the requirement that any copy of data in a cache 
must be identical to (or actually be) the owner of thaS 43 
location's data. The owner of a location's data is gen- 
erally defined as the respective location having tho 
most recent version of the data residing in the respec- 
tive memory location. The owner of data can be efcher 
an unmodified location in main memory, or a modified so 
location in a write-back cache. In computer systems 
where independent bus masters can access memory, 
there is a possibility that a bus master, such as a direct 
memory access controller, network or disk interface 
card, or video graphics card, might alter the contents 53 
of a main memory location that is duplicated in the 
cache. When this occurs, the cache is said to hold 
"stale", or invalid data, in order to maintain cache 


coherency, it is necessary for the cache controller to 
monitor the system bus when the processor does not 
own the system bus to see if another bus /raster 
accesses main memory. Thio method of monitoring 
the bus Is referred to aa snooping. 

Tho cacho controller must monitor the system 
buo during memory reads by a bus master in a write- 
back cache design because of the possibility that a 
previous processor write may hove altered a copy of 
data En tho cache that hao not boon' updated In main 
memory. Thb to referred to as read snooping. On a 
read snoop hit where the cache contains data not yet 
updated in main memory, the cache controller gener- 
ally providee tho respective data to main memory, and 
tho requesting bue master generally reads this data 
en routo from the cache controller to main memory 
this operation being referred to as anarflng. The cacho 
controller must aico monitor tho system bus during 
memory writes because the bus master may write to 
or otero memory location that resides in the cache. ' 
This to referred to ao write snooping. On a write snoop 
hfc tho cocho enfsy to either marked invalid in tho 
cacho dfrcetoy by tho cacho controller, signifying the* 
thio ontry to no longer correct or tho cache is updated 
along wfth main memory. Therefore, whon a bus mas- 
ter roado or wyftoo to main memory in a write-back 
cacho design, or write to main memory in a write- 
thrcygh cacho design, tho cecho controller must latch 
tho oyotom addreoo and porfbrm a cacho look-up in 
tho tag directory corrooponding to tho pago offeot 
location whoro tho memory access occurred to sea if 
tho main momojy location being aecossod also 
resides in tho cache. If a copy of tho data from thto 
location doea reside in tho cache, then tho each® con- 
trailer takes the appropriate action depending on 
whotoor a read or write snoop hit has occurred. This 
prevents incompatible date from being stored in main 
memory and the cache, thereby preserving cacho 
coherency. 

Another consideration In tho preservation of 
cacho coherency is the handling of processor writes 
to memory. When the processor writes to main menv 
ory, the memory location must be checked to deter- 
mine if a copy of the data from, this location also 
resides in the cache, if a processor write hit occurs in 
a write-back cache design, then the cache location rs 
updated with the new data and main memory may be 
updated with the new data at a later time or should the 
need arise. In a write-through cache, the main mem- 
ory location is generally updated h conjunction with . 
the cache location on 0 processor write hit If a pro- 
cessor write miss occurs, the cache controller may 
ignore the write miss in a write-through cache design 
because the cache is unaffected In this design. Alter- 
natively, the cache controller may perform a "write- 
allocate" whereby the cache controller allocates a 
now line in the cache in addition to passing the data 
the data to the main memory. In a write-back cache 
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design, the cache controller generally allocates a new 
line in the cacho when a processor write miss occurs. 
This generally Involves reading the remaining entries 
to fill the line from main memory before or jcintiy with 
providing the write data to the cache. Main memory is 3 
updated at a later time should tho need ansa 

Caches have generally boon designed indoperv 
dently of the microprocessor. The cache Is placed on 
the locai bus of the microprocessor and interfaced be- 
tween the processor and the system bus during tho 10 
design of the computer system, However, with tho 
development of higher transistor density computer 
chips, many processors are currently being designed 
with an on-chip cache in order to meet performance 
goals with regard to memory access times. Tho on- 13 
chip- cache usod in those processors is generally 
small, an exemplary size being S kbytes in see. Tho 
smaller, on-chip cache is generally faster then a lorgo 
of?-chip cache and reduces the gap between fast pro- 
cessor cycle times and the relatively slow ecceso 20 
times of large caches. 

In computer systems that utiiizo procoosors with 
on-chip caches, an oxtomoi, aocond level cocho io 
often added to tho system to further Improvo momoyy 
acc©oo time. .Tho cocond lovd cacho © generafiy 23 
much larger than tho on-chip cacho, and, when usod 
in conjunction w&h tho on-chip cacho, provides a gn> 
ator overall hit rate than tho on-chip cache would pro- 
vide by itself. 

In systems that incorporate multiplo levelo of 20 
caches, when the processor requests data from mem- 
ory, the on-chip or first level cache is first chocked to 
see if a copy of the data resides there. If so, then a first 
level cache hit occurs, and the first bvd cacha pro- 
videa tho appropriate data to the processor. If a fmt 33 
level cache miss occurs, then tho second level cacho 
is then checked. If a second level cacho ha occurs, 
then the data is provided from tho second level cacho 
to the processor. If q second level cacho misa occurs, 
then the data is retrieved from main memory. Write 40 
operations are similar, with mix and matching of the 
operations discussed above being -possible. 

In multileveJ cache systems, it has generally been 
necessuy for each cache to snoop the system bus 
during memory writes by other bus masters in order 43 
to maintain cache coherency. When the microproces- 
sor does not have control of the system bus, the cache 
controllers of both the first level and second level 
caches are required to latch the address of every 
memory write and check this address against the tags so 
in its cache directory. This considerably impairs the 
efficiency of the processor working out of its on-chip 
cache during this time because it is continually being 
interrupted by the snooping efforts of the cache con- 
troller of the on-chip cache. Therefore, the require- 53 
ment that the cache controller of the on-chip cache 
snoop the system bus for every memory write deg- 
rades system performance because it prevents the 


processor from efficiently operating out of its on-chip 
cache while it does not have control of the system 
bus. 

In many instances where multilevel cache hierar- 
chies exist with multiple processors, a property refer- 
red to as multilevel Inclusion is desired in the 
hterarchy. Multilevel inclusion provides that the sec- 
ond level cache is guaranteed to have a copy of what 
is inside the first level, or onrchip cache. When this 
occurs, tho second level cache is said to hold a 
superset of the fast level cache. MultSevel inclusion 
has mostly been used in multi-processor systems to 
prevent cache coherency problems. When multilevel 
inclusion is implemented in multi-processor systems, 
the higher level caches can shield the lower level 
cacho© from cache coherency problems and thereby 
prevent unnecessary blind checks and invalidations 
that would othcrwiso occur in tho lowor lovd caches 
if rnuJtflovd inclusion woro not implemented. 

Tho prooont irrvontion includes a method for 
achieving muitflovol induoion among first and second 
loved cschoo in a computer system. Multfevel indu- 
ston obviates tho necessity of tho cache controller of 
the first lovd cache to snoop tho system bus for every 
memory write that occurs while the processor is not in 
control of tho system bus because the cache control- 
ler of tho second level cache can assurre this duty for 
both cachoa. This frees up the first level cache con- 
troller and thereby allows the microprocessor to oper- 
ate more efficiently out of the first levei cache when it 
does not have control of the system bus. 

The second level cache preferably has a number 
of ways equal to or greater than the number of ways 
in the first level cache. The first level and second level 
caches are 4-way set associative caches in the pre- 
ferred embodiment of the present invention. In this 
embodiment there s a ono-to-one correspondence 
between tho each© ways in the first level cache and 
tho cache ways in the second level cache. During a " 
firat lovd cache line fi from main memory, the first 
level cache controller communicates to the" second 
level cache controller the particular first level cache 
way in which the data is to be placed so that the sec- 
ond levei cache controller can place the data in the 
corresponding second level cache way. When the 
second level cache controller is transmitting a copy of 
data to the firstlevel cache controller, the second level 
cache controller informs the firstlevel cache controller 
which second level cache way the data is coming 
from. The first level cache controller disregards its 
normal replacement algorithm and fills the corre- 
sponding first level cache way. In this manner, the first 
and second level caches align themselves on a "way 
basis. 8 This \vay" alignment prevents the second 
level cache controller from placing data in a different 
way than the firstlevel cache and In the process poss- 
ibly discarding data that resides in the first level 
cache. 
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The cache organization of the ffrat level cache 
according to tho prcsont invention is a write-through 
architecture. On o processor write, the information io 
preferably written to tho first level csche, regardless 
of whether a write hit or write miss occurs, and eater- 5 
nai write bus cycles are initiated which write tho infor- 
mation to tho second lovd cacho. Tho first lovd cscho 
broadcasts tho particular first lovei cache way where 
thQ data was placed to tho second teveJ cacho conlnrf- 
ler go that tho second level cacho controller can ptoco 10 
tho data in tho corresponding second level cecho 
way, thereby retaining the Q way° dignmont The sec- 
ond level cache is preferabty q wrfto-beefc csche 
according to the preferred embodiment, but could bo 
a write- through cacho W desired. 1Q 

The second levd cache controller utflteeo on' 
indusibn bc2 with respect to each Una of dste In tho 
second level cache in order to remember whether 0 
copy of thfe data qJso nasides in the fast level cacho. 
Whoh o location in tho flrat lovd cacho to replocsd, 20 
whether concurrency xM\ o oceond lovd cscho rep- 
lacement from memory o? dtecSy from tfto ocoon^ 
lovd cocho, 610 oocond !ovd cacto controftor octo an 
inefuokm bft fer that location in &o cocond lovd cccfco 
to otgntfy the* a copy cf fftfo dato io duplicated In tho as 
firotlGvol cacho. Whon thio cceura/oa ©Shcr focoteno 
in tho second level cacho tho* cosroopond to tho oamo 
location in tho ftrot lovol cacho ftovo tftofer indusion bGo 
cloored by tho socond lovcJ cacho controller to aignSy 
that tho data hdd fn thoao locatteno doco not resido 30 
in th© first lovel cacho. 

The second level cache controJbr performs tho 
principal snooping dudes fcr both cachoo when tho 
proceooc? doos.net novo confod o? ff*o system bua ' 
When * write snoop hS ocouro in tho second levol 33 
cache. induston bft io rood by tho cocond level 
cache controller to see whether tho firs* level cache 
controller must also snoop tho msmoffy cecass. If tho 
inclusion b it is n ot set tfton tho fteS lovd cache con- 
troller a left aiona I? tho induston bit io est then tho 49 
second level cache controfler directs fro first level 
cache controller to snoop that parScular memory 
access. In this manner, tho ftret levd cache controller 
can neglect its snooping duties untS tho second level 
cache controller determines tha* a write snoop nit on 43 
the first level cache has actually occurred. Thia aHows 
the processor to operate mora effidenfly out of Its ftrat 
level cache when it does not have control of the sys- 
tem bus. 

A better understanding of the invention can be so 
obtained when the following detailed description of 
the preferred embodiment & considered In conjunc- 
tion with the fdlowing drawings, in which: 

Figure 1 is a block diagram of a computer system 
induding first and second level caches and implo- & 
menting multilevel indusion according to the pre- 
sent invention; 

Figure 2 depicts the organization of the 2-way set 


associative C1 cache of Figure 1; 
Figure 3 depicts the organization of the 2- way set 
associative C2 cache of Figure 1; 
Figures 4A and 48 depict a flowchart illustrating 
the operation of cscho read hits and misses 
according to the present invention; and 
Figure 3 b a flowchart fllustrating the operation of 
read and write snooping according to the present 
invention. 

Referring now to Figure 1, 0 computer system 3 
is gonenatfy shown. Many of the details of a computer 
system that are not relevant to the present invention 
have been omitted for the purpose of darfty. The com- 
puter system S Indudes a microprocessor 20 that is 
connected to 0 tot lovd cacho C1 that is preferably 
located on the samo chip 22 as the processor 20. The 
chip 22 indudes a C1 cache controller 30 that is co^. 
nected to the C1 cache and controls the operation of 
tho CI cscho. Tho processor 20, the firot level cacho 
C1, and tho firat level cache consroBer 30 are connec- 
ted to a system buo 24 through a local processor buo 
2S. A ceeortd tovd cacho C2 io connoctod to tho locd 
proccooor buo 2a A cocond lovd cacno controller 
referred to 00 tho C2 cocho contrellor 32, io connected 
to thoG2 cacho and tho local procoocor buo 23. Ran- 
dom occooo memory 23. which to 4 Gigabytes in,sao 
according to tho proeont cmbcdlnont and an IntoiU- 
gont bis masters are connected to tho system bus 
24. Tho random access momory (RAM) 28, indudes 
a system memory contreffer (not shown) that controls 
the operation of tho RAM 28. Too RAM 26 and the syo- 
tern memory controller (not shown) are hereinafter 
referred to aa main memory 2a Tho system bua 24 
indudeo 0 data bis and 0 32-ba address bus, the 
address bus induding address bite A2 to A3 1, which 
ailctts access to any of 2» 32-bit dpubiewords in main 
memory 28. The buo master 28 may be any of the type 
that controls tho system bus 24 when the processor 
syotem is on hold, such as tho system direct memory 
access (DMA) controller, a hard disk interface, a local 
area network (LAN) Interface or a video graphics pro- 
cessor system. 

Tho C1 and C2 caches are aligned on a 'way 0 
basis such that a copy of data pieced in a particular 
way in one of the caches can only be placed in a pre- 
determined corresponding way in the other cache. 
Thfe °way° alignment requires that the C2 cache have 
at least as many cache ways as does the C1 cache. 
If tho CI and C2 caches have the same number of 
ways, then there is a one-to-one correspondence be- 
tween the cache ways in the C1 cache and the cache 
ways In the C2 cache. If the C2 cache has more cache 
ways than the C1 cache, then each cache way in the 
C1 cache corresponds to one or more cache ways in 
the C2 cache. However, no two CI cache ways can 
correspond to the same C2 cache way. This require- 
ment stems from the facf fffat eaclfmemdry address 
has only one possible location in each of the C1 and ' 
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C2 caches. Accordingly, if two C1 cache ways corres- 
ponded to a singjo 02 cache way, then there would be 
memory addreoo locations residing in tho C1 cacho 
that would bo incapable of residing in tho C2 cacho. 
The respective C2 cache way location would bo 
incapable of holding the two memory addrosoes 
which would reside In each of the reepectfvo C1 cache 
ways that corresponded to the respective C2 cacho 
way location. 

The actual sfeo of each of tho caches io not 
important for the purposes of the Invention. However, 
the C2 cache must bo at least largo aa tho C1 
cache to achieve multilevel Inclusion, and the C2 
cache □ preferably at least four times as large as tho 
C1 cache to provide for an Improved cache hit rate. In 
the preferred embodiment of the present invention, 
the C1 cache is 8 kbytes in sczq and the C2 cacho to 
preferably 512 kbytes in size. In this embodiment, the 
C1 cache and the C2 cootie are each 4-way set 
associative caches. In an alternate embodiment of tho 
present invention, tho C1 end C2 esctioo aro oach 2- 
woy sot-associative cachco. 

Referring now to Rguroo 2 and 3, concop&d 
diagrams of the C1 and C2 cachoo with thoir ru8poo= 
tive cache corrtroilero 30 and 32 configured in a 2-sray 
set-eooociatSvo organization . aro gonoraOy shown. 
Tho following discussion io intondod lo provide an 
introduction to the structure and operation of a set- 
ascociativo cache as well as the relationship between 
the cache memory, cacho directories, and main mem- 
ory 2d. The C1 and C2 caches aro discussed with 
reference to a 2-wsy set-asaociatlvG cache organi- 
zation as a smnpier example of tho mcro comply 4- 
way sot-ass odativQ cacho ofganizatiofi of tho 
preferred embodiment. Tho spoctsi cacho controller 
design considerations that artso in a 4-way s@$» 
associative cache organization that do not occur bi a 
2-way set-associative organization qtq noted In tho 
following discussion. 

The C1 cache includes two banks or wayo of 
memory, referred to ao A1 and B1, which are each 4 
, kbytes in size. Each of the cache ways A1 and 31 are 
organized into 128 sets, with each set including eight 
tines 58 of memory storage. Eaach line includes one 
32-bit douWeword, or four bytes of memory. Main 
memory 26 is conceptually organ sod as 2 20 pages 
with a page size of 4 kbytes, which is equivalent to the 
see of each C1 cache way A1 and B1. Each concep- 
tual page in main memory 28 includes 1024 lines, 
which is the same number of lines as have each of the 
cache ways A1 and 81. The unit of transfer between 
the main memory 26 and the C1 cache is one iine. 

A particular line location, or page offset, from 
each of the pages in main memory 26, maps to the 
similarly located line in each of the cache ways A1 and 
31. For example, as shown in Figure 2 the page offset 
from each of the pages in main memory 28 that is 
shaded maps to the equivalents located, and shaded, 


line offset in each of the cache ways A1 and 81 . In this 
way, a particular page offset memory location from 
main memory 28 can only map to one of two locations 
in the C1 cache, these locations being in each of the 
s cache ways A1 and B1. 

Each of the cache ways A1 and 81 include a 
cache directory, referred to as directory DA1 and 
directory DS1, respectively, that are located in the C1 
cacho controller 30 of the C1 cache. Tho directories 
io DA1 and DB1 each indude one entry 60 and 62, re- 
spectively, for each of the 128 sets in the respective 
cache way A1 and 81. The cache directory entry for 
each set has three components: a tag, a tag valid bit 
and eight line valid bits, as shown. The number of line 
18 valid bits equals tho number of linoo in oach set. The 
20 bits in tho tag .flold hold tho upper address bits, 
addreao bite A1 2 to A31 , of tho main memory address 
location of tho copy of data that resides in the respec- 
tive) oet of tho cacho. The upper address bits address 
20 tho appropriate 4 kbyto conceptual page in main 
memory 28 vsmcre the data in the res pective set of tho 
cacho is located. The remaining address bits from ffiis 
main memory addrass location, address bits A2 to 
A11, can bo partitioned into a set address field com- 
as pricing ©oven bfte, AS to A1 1, which are used to select 
ono of tho 128 soto in tho C1 cacho, and a lino 
addroao field comprising 3 brto, A2 to A4, which aro 
used to select an individual tine from the eight lines in 
the ootocted sot Therefore, the lower address bits A2 
30 through A11 servo ao tho "cacho address 0 which 
dfrectiy selects one of the line locations in each of the 
ways A1 and B1 of the C1 cache. 

When tho microprocessor initiates a memory 
read cycle, the address bits AS to A1 1 are used to 
33 select one of tho 123 sete, and the address b&s A2 to 
A4 are used to select one of the respective line valid 
bits within each entry, in the respective directories 
DA1 and DB1 from tho selected set Tho lower 
address bits A2 to A11 are also used to select the - 
40 appropriate line in the C1 cache. The cache controller 
compares the upper address bS tag field' of the 
requested memory address with each of the tags 
stored in the selected directory entries of the selected 
setfbreachof the cache ways A1 and B1. At the same 
is timo, both the tag valid and line valid bits are checked. 
If tho upper address bits match one of the tags, and 
if both tho tag valid bit and the appropriate ilnevalkJ 
brto are set for the respective cache way directory 
where the tag match was made, the result a cache 
so hit, and the corresponding cache way is directed to 
drive the selected line of data onto the data bus. 

A miss can occur In either of two ways. The first 
is known as a line miss and occurs when the upper 
address bits of the requested memory address match 
59 one of the tags in either of the directories DA1 or D91 
of the selected set and the respective tag valid bit is 
set but the respective line valid bit(s) where the 
requested data resides are dear. The second is called * 
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a tag miss and occura when either the upper address 
bits of the requested memory address do not match 
either of the respective tags in directories DA1 or DB1 
of the selected sot where the requested data is 
located, or the respective tag valid bft for each of tho s 
directories OA1 or DB1 are not dear. 

The C1 cache con troll or 30 includes a replaco 
ment algorithm that determines which cache way, A1 
or B1. in which to piece new data The replacement 
algorithm used is a leoot recency used (LRU) 10 
algorithm that places new data in the cache way that 
was least recentfy accessed by tho processor for 
data. This is because, statistically; tho way moat 
recendy used is the way most Ifeefy to bo needed 
again in the near future. The C1 cache controller 30 is 
includes a directory 70 that holds a LRU bft for each 
set in the cache, and the LRU bit to pointed away from 
the eacne way that was most recently accessed by tho 
processor. Therefore, if data requested by tho pro- 
cessor resides in way A1 , thqn tho LRU bft b pointed so 
toward 81. If tho data roquootcd by tho procoooor 
residoo in way 81, thon tho LRU bft to potntod toward 
A1. 

In tho 4-way set-asoocisfcko C1 eoeho ©reanfr- 
zation of tho profonod omtedimont, a mora olabcsnto 
LRU or pseudo-LRU ropJacomont afgerfthm can bo 
usod in the C1 cacho controller 30. Tho cheicQ <rf q 
replacement algorithm to generally irraiovant to tho 
present invention, and ft to suggested that an LRU or 
pseudo-LRU algorithm bo chosen to optimize tho per- & 
ticular cache design used In the chosen embodiment 
One replacement algorithm that can be used in the CI 
cache controller 30 in the 4-way set-assoclatfve C1 
cache organization of the prafe*TGd embodiment to a 
pseudo-LRU algorithm which operates as followa 33 
The 4-way set-associative C1 cacho includes four 
ways of memory referred to ao-WO. W1. W2. and W3. 
Three bits, referred to as X0, X1, and X2, era located 
in the C1 cache controller 30 and are doffnod for a roo- 
pective set in each of the ways in tho 4-way C1 cacho. to 
These bits are, called LRU bfta and are updated for 
every hit orreplace in the C1 cacho. If the most recent 
access in the respective set was to way W0 or way 
W1 , then X0 is set to 1 or a logfe high value. Bft X0 to 
set to 0 or a logic low value if tho most recent access 4$ 
was to way W2 or way W3. If X0 to sot to 1 and tho 
most recent access between way W0 and way W1 
was to way W0. then X1 to set to 1, otherwise X1 to 
set to 0. If XO is set to 0 and the most recent access 
between way W2 and way W3 was to way W2, then so 
X2 is set to 1, otherwise X2 Is set to 0. 

The pseudo LRU replacement mechanism works 
in the following manner. When a line must be replaced 
in the 4-way C1 cache, the C1 cacho controller 30 
uses the XO bit to first select the respective ways W0 53 
and W1 or W2 and W3 where.the particular line relo- 
cation candidate that was least recently used to 
located. The C1 cache controller then utilizes the X1 
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and X2 bits to determino which of the two selected 
cache ways WO and W1 or W2 and W3 holds the res- 
pective line location that was least recently used, and 
this line location to marfced for replacement 

Tho C1 cacho controller 30 broadcasts its LRU 
information to tho C2 cache controller 32 on C1 and 
C2 cacho rc-d mioooo and on processor writes 
according to the present invention, in this manner, the 
C2 cache controller 32 to able to place the copy of 
data that it receive© from either the main memory 26 
on read misses orfVom tho processor 20 on processor 
write) into the C2 cache way corresponding to the C1 
cacho way where the C1 cache controller placed the 
copy of data, thereby achieving multilevel Inclusion in 
addition, the C1 cacho controller 30 ignores its LRU 
replacement algorithm on a C1 cacho read miss and 
a C2 cache read hit 00 that the C1 cache controller 30 
can ploco tho copy of data that ft receives from tho C2 
cacho controller 32 In tho C1 cacho way correspond- 
ing to tho C2 cacho way whero tho road hit occurred. 

Tho 2-way oot-esaodativo C2 cacho to organized 
in a manner similar to that of tho 2-woy oot-asscciatfvo 
C1 eacfco. In tho prcfarod ombodfenont tho C2 cacho 
prafora&y ccmprfoea 512 kbytes of cacho data RAM. 
Rofestfng now to Rguro 3, ecch cache way A2 and 82 
« tho C2 cache to 2SS kbyteo in gkg and indudea 
8192 soto of oight lines each. Tho lino size in the C2 
cacho to ono 32-bftdcuWeword, which to thosamo as 
that of tho C1 cacho. Tho 4 Gigabyte main memory 26 
is organised into 2* conceptual pages with each con- 
ceptual page being 259 kbytes in see. The number of 
conceptual pages erf main memory 26 for the C2 
cacho to loao than that erf tho C1 cache because the 
conceptual page see for the C2 cache to greater than 
that of tho C1 cacha Ao in the CI cachs, each line 
location or page offset in main memory 26 maps to a 
simMariy located line in each of the cache ways A2 and 

The C2 cache controller 32 includes cache way 
directories DA2 and D32. The cache way directories 
DA2 and DB2 have set entries which include 14-bit 
tag fields, as opposed to the 20-bft tag fields in the 
entriee of the C1 cache directories DA1 and DB1. The 
14-bft tag fields hold the upper address bits, address 
bfia A18 to A31, thai address the appropriate 256 
kbyte conceptual page in main memory 26 where the 
data in tho respective set of the cache is located. Tho 
remaining address bits, A2 to A17, can be partitioned 
into a set address field comprising thirteen bits, A5 to 
A17. which are used to select one of the 8192 mete 
in tho C2 cache, and a line address field comprising 
3 bits, A2 to A4, which are used to select an individual 
linofrom the eight lines in the selected set Therefore, 
in the C2 cache the lower address bits A2 to A1 7 serve 
as the °cache address 0 which directly selects one of 
the line locations in each of the ways A2 and 82 of the 
C2 cache. 

The C2 cache controller 32 according to the pre- 
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sent invention doss not generally require a replace- 
ment algorithm because trie C2 cache receives new 
data oniy on C1 and C2 cache read misses and on 
processor writes, and In those Irtsfcnces the 02 cacho 
controller recQivos the way location from the C1 cacho 
controller and must fiB the corresponding C2 cacho 
way. Therefore, the C2 cache controller 32 doso not 
need a replacement algorithm because the respective 
C2 cache way where data © pieced is determined by 
the data's way location in the C1 cache. However, if 
tho C2 cache has more ways than has tho C1 cache, 
then the C2 cacho controller 32 will require use of a 
replacement algorithm. In this fciofcanco, a C1 cacho 
way wifl correspond to two or more C2 cacho wayo. 
Accordingly, when tho C1 csscho controilor 30 broad- 
casts the C1 cacho way location to tho C2 cacho con- 
troller 32, tho C2 cacho controilor 32 wffl nood a 
replacement algorithm in order to docido between tho 
multiple C2 cache wsyo that correspond to tho C1 
cacho way location in which to place the received 
data. 

The 2-wey set-associattvo C1 and C2 caches are 
aligned on a °vysy° bssia ouch that the ways A1 and 
S1 in tho C1 cache havo a one-to-one correspond- 
ence with the ways A2 and 32, respectively, of tho C2 
cacho. In th& manner, a pago offset ftom main mom- 
ory 26 that is placed in tho respective lino location in 
a C1 cache way A1 or B1 hso oniy one possible toes* 
tion in the corresponding C2 cacho way A2 or B2, re- 
spectively. Conversely, a respective line location in a 
02 cache way A2 or 32 has only one possible location 
in the corresponding CI cacho way A1 or 31, respect- 
ively. However, because tho C2 cache is 64 times as 
largo as the C1 cacho, each of tho C2 cache ways A2 
or 32 hold 64 linoa of data thstt ooch correspond to, 
or could bo located in, a single lino or pago offset Iocs* 
tion in the corresponding C1 cache way A1 or 31. 
: Therefore, the C2 cache controilor 32 according to tho 
present invention includes inclusion bite 80 for each 
of its respective tines. This enables the C2 cache con- 
troller 32 to remember whether a copy of data from the 
respective C2 cacho lino also resides in the corre- 
sponding 01 cache lino location. 

Tho use of Inclusion bits SO allows tho C2 cacho 
controller 32 to remember which of the 64 lines of data 
in the respective C2 cache way A2 or 32 that corres- 
ponds to a single 01 cache way location holds a copy 
of data that is duplicated in that C1 cache location. For 
example, if a line in the 02 cache receives a copy of 
data from main memory 26 that was also placed In the 
C1 cache, or if a line in the 02 cache provides a copy 
of data that is placed in the 01 cache, then an inclu- 
sion bit for the respective 02 cache line is true or set 
to a logic high value, signifying that the respective 02 
cache line holds a copy of data that is duplicated in the 
respective 01 cache location. The other 63 line loca- 
tions in the 02 cache which correspond to the respec- 
tive C1 cache location Involved in the above operation 


have tftefr Inclusion bits d eared as a reminder that the 
copy of data that they hold is not duplicated in a 01 
cache location. Th© & important because one of 
these other 63 line locations may hold data that was 

s previously duplicated In the respective C1 cache loca- 
tion before ono of tho operations mentioned above 
placed new data in tho respective 01 cache location, 
and therefore one of these 63 locations may have its 
incfuotoa bit sot Tho oniy instance where one of theso 

10 other 63 C2 cache locations would not have its inclu- 
sion bit ©ot to when tho respective 02 cacho line loca- 
tion that was involved in tho above operation and had 
ito induoion bit oot aioo hold tho copy of date that' was 
duplicated in tho rospoctivo 01 cacho location before 

19 tho oporation took placo and therefor© already had its 
induoion b& sot 

Referring now to Figures 4A and 4B f a flowchart 
describing tho operation of tho 01 and 02 caches 
according to the present invention is shown. It is 

20 understood that humorous of these operations may 
occur concurrently, but a flowchart format has boon 
chosen to simplify tho explanation of the operation. 
For darQy. tho flowchart is shown in two portions, with 
tho interconnections between Figures 4A and 48 

23 designated by reference to tho cirded numbers one 
and two. Step 100 represents that the computer sys- 
tem S lo operating or turned on. In some computer 
systems, the processor is required to have control of 
the system bis 24 before it may issue memory reads 
30 or writes. However, in the system S according to the 
preferred embodiment the processor 20 is not 
required to have control of the system bus 24 when it 
issues memory reads or writes but rather the pro- 
cessor 20 can operate out of the 01 cache and the 02 
33 cacho without requiring use of the system bus 24 unfa] 
a C1 and 02 cache read miss or a processor write 
beyond any posting depth occurs. 

When the processor 20 attempts a main memory 
read in step 102, tho C1 cache controller 30 first 
40 chGckstheC1cacheinstQp104todetermineffacopy . 
of the requested main memory data resides in the 01 
cache. If a copy of the requested data does not reside 
in the 01 cache, then a C1 cache read miss occurs in 
step 108, and the read operation is passed on to the 
4& 02 cache, where the 02 cache controller 32 then 
checks the C2 cache In step 10& If a copy of the . 
requested data does not reside in the C2 cache, then, 
a C2 cache read miss occurs in step 110 t and the 
operation is passed onto the system memory control- 
so ler to obtain the necessary data from main memory 
28. 

Mas! memory 26 provides the requested data to 
the 01 cache, tha 02 cache and the processor 20 in 
step 112, and the 01 cache controller 30 places the .. 
39 data into one of its cache ways A1 or B1 according to 
its particular replacement algorithm in step 114. The 
data is placed in the CI cache because of the statis- 
tical likelihood that this data will be requested again 
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soon by the processor 20. The C1 cache controller 30 
during this period has been broadcasting to the C2 
cache controller 32 the particular C1 cache way A1 or 
31 fn which it Is placing the data, represented In step 
118, so that tho C2 cache controller 32 can ptaco the s 
data in the corresponding 02 cache way A2 or B2 In 
step 120. Tho 02 cache controller 32 sets tha inclu- 
sion bit on the respective C2 cache memory location 
where tho data is stored In step 122, signifying that o 
copy of the data in this location also resides in the C1 to 
each©. The C2 cache controller 32 also d©are tho 
inclusion bite on the ether 63 C2 cache locations that 
correspond to the same page offset location in the C1 
cache in step 124 to signify that a copy of tho date in 
these locations does not reside in the C1 cache. Upon iq 
completion of the memory read, the computer system 
returns to step 1 00. 

The above sequence of events occura on a C1 
and C2 cache read miss and also when the computer 
system S io first turned on because the C1 and C2 20 
cachoo aro both empty at power on of tho computer 
system S and C1 and C2 cacho miosoo are therefore) 
gueramood.. Tho majority of processor momory roadn 
that occur immodlately after power on of tho computer 
systom S wfll bo C1 and C2 cacho mfocca bocauco as 
the C1 and C2 caches aro rdatfvoiy ompty a* the too. 
In this manner, the Ct and 02 cachcxi aro ffflod w8h 
data and align themselves on a Q vrcv° bao© whoroin 
data In a particular way A1 or B1 &i tho C1 cscho b 
guaranteed to be located in tho corresponding cacho 20 
way A2 or 82 in the C2 cache. In addition, whon tho 
computer system S has been operating far a white and 
a C1 and C2 cache read miss occurs, the resulting lino 
fills of data in the CI and C2 caches are performed as 
described above and therefore the "way 0 alignment Jo & 
maintained. 

When the processor 20 initiates a main memory 
read in step 102 and the C2 cache controller 32 
checks the C2 cache in step 108 after a C1 cacho 
miss occurs in step 1 08, and a copy ef tho requested to 
data resides in the C2 cache, then a 02 csche hft 
occurs in step 130. Tho C2 cacho controller 32 pro- 
vides the requested, data to tho processor 20 In step 
132, and also provides the data to tho C1 cache in 
step 134 due to the statistical l&eiihood that this date 49 
will be requested again soon by the processor 20. Tho 
C2 cache controller 32 informs the C1 cache control- 
ler 30 as to the particular C2 cache way A2 or 82 in 
which the data is located in the C2 cache in step 136 
so that tho C1 cache controller 30 can place the data 50 
in the corresponding C1 cache way A1 or 81 in step 
138. This requires that the C1 cache controller 30 dis- ' 
regard its normal LRU replaced algorithm because 
the replacement algorithm may choose a different C1 
cache way A1 or B1 in which to place the data. In this 59 
manner, the C1 and C2 caches maintain their "way° 
alignment without a requirement for the C2 cache con- 
troller 32 to transfer data between the ways in the C2 
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cache. The 02 cache controller 32 sets the inclusion 
bit on the 02 cache location where the requested data 
is located in step 140. signifying that a copy of this 
data also resides in the C1 cache. The C2 cache con- 
troller 32 also dears tho other 63 inclusion bits on tho 
C2 cache memory locations that correspond to tho 
same pago offset location to signify that a copy of the 
data in those locations does not reside in the C1 
cacho. Tho computer system S Is then finished with 
tho memory roed and returns to step 100. 

When tho processor 20 initiates a memory read 
m step 102 and checks the contents of the C1 cache 
m step 104 to determine if a copy of the requested 
data resides there, and a copy of the requested data 
doe© reside in tho C1 cache, then a C1 cache hit takes 
place in step 150. The C1 cache controller 30 pro- 
vides tho requested data to the processor 20 in step 
152. and operation of the computer system S is 
resumed in step 100. Since multilevel Inclusion exists 
in tho cache subsystem, the C2 cache is guaranteed 
to hove a copy of the data that the C1 cache controller 
30 provided to tho processor 20, and no transfer of 
date foam tho C1 cgcho controller 30 to the C2 cache 
controller 32 b necessary when a C1 cache read hQ 
tafceo plaoo. 

Tho cacho architecture of the C1 cache in the pre- 
ferred embodiment b preferably a write-through 
cacho orohftocturo and tho cacho architecture of tho 
02 cacho to proforaWy a writo-becfc cache architec- 
ture. However, tho uso of other cache architectures 
for tho C1 cacho and tho C2 cacho is also contem- 
plated. When the processor 20 performs a memory 
write operation, the data io written into tho C1 cacho, 
regardless of whether the processor write is a C1 
cacho write hit or write, miss. In addition, processor 
write© initiate external write bus cycles to write the 
respective data into the G2 cacho. When this occurs, 
tho C1 cache controller 30 broadcasts the particular 
C1 cacho way where the data was placed so that the 
C2 cache controller 32 can place the data in the cor- 
responding 02 cache way. Therefore, the C1 and C2 
caches allocate write misses according to the present 
invention. It is preferred that the C1 and 02 either both 
allocate writ© misses or both do not allocate write mis- 
ses. If tho C1 cacho were to not allocate writes and 
tho 02 cache were to allocate writes, the designs 
would be more complicated. The 02 cache controller 
32 would require an LRU algorithm and would need 
to insure that if the 02 cache controller LRU algorithm 
selected a particular 02 cache way that contains a 
copy of data that is duplicated in the C1 cache, the 
LRU algorithm would be overridden or the caching 
aborted so that muitaevel Inclusion remained guaran- 
teed. 

Referring now to Figure 5, when the. :ntelligent 
bus master 28 gains control of the system bus 24 in 
step 200, the 02 cache controller 32 watches or 
"snoops" the system bus 24 in step 202 to see rf the 
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bus master 28 performs any writes, and reads in the 
case of a write-back cache, to main memory 26, and, 
if so. which memory location is being accessed. The 
C2 cache controller 32 can perform the snooping res- 
ponsibilities for both the C1 and C2 caches because s 
the C2 cache is guaranteed to have a copy of ail the 
data that resides in the C1 cache due to the multilevel 
inclusion. 

if the bus master 28 writes to main memory 26 In 
step 204 and a write snoop hit occur* in the C2 cache 10 
in step 208, then the C2 cache controller 32 checks 
the inclusion bit tor the respective C2 cache location 
to see whether the C1 cache controller 30 must deo 
snoop the memory access in step 208. If the inclusion 
bit Is not set in step 208, then a copy of the data from rs 
the memory location being written to does not reside 
in the C1 cache, and the C1 cache controller 30 ia left 
alone, (n this case, the C2 cache receives the new 
copy of data in step 210 and the C2 cache controller 
32 resumes its snooping duties in step 202. If the 20 
inclusion bit on the C2 cache memory location is set 
in step 208 after a snoop hit in step 206, then the C2 
cache controller directs the C1 cache controller 30 to 
snoop that particular memory access in step 212. In 
step 214, the C1 and C2 caches each receive a copy 2$ 
of the new data, and the C2 cache controller 32 
resumes its snooping duties in step 202. If a snoop 
miss occurs in step 206 after the bus master 28 writes 
to a memory location in step 204. then the C2 cache 
controller 32 resumes its snooping duties in step 202. 30 
The C2 cache controller 32 continues to snoop the 
system bus 24 in step 202 until the bus master 28 is 
no longer in control of the system bus 24. 

If the bus master 28 reads a main memory loca- 
tion in step 204 and a read snoop hit occurs in the C2 35 
cache in step 220, then the C2 cache controller 32 
checks the 02 cache location In step 222 to determine 
if it is the owner of the respective memory location. If 
not then main memory 26 or other soiree services 
the data request and the C2 cache controller 32 40 
resumes snooping in step 202. If the C2 cache con- 
troller 32 is the owner of the memory location, then the 
C2 cache controller 32 provides the requested data to 
main memory 26 In step 224. The bus master 28 reads 
this data in step 226 when the data has been placed 45 
on the data bus, this being referred to as snarfing. The 
C2 cache controller 32 then resumes its snooping 
duties h step 20Z If a snoop miss occurs in step 220 
after the bus master 28 reads a memory location in 
step 204, then the C2 cache controller 32 resumes its so 
snooping duties in step 202. 

In this manner, the C1 cache controller 30 can 
neglect its snooping duties until the C2 cache control- 
ler 32 determines that a snoop hit on data held in the 
C1 cache has actually occurred. This allows the pro- 55 
cessor 20 to operate more efficiently out of the C1 
cache while it does not have control of the system bus 
24 because the C1 cache controller 30 only has to 


snoop the system bus 24 when a C1 cache snoop hit 
occurs, not on every memory wrte as it normally 
would. 

The foregoing disclosure and description of the 
invention are illustrative and explanatory thereof, and 
various changes in the size, components, construc- 
tion and method of operation may be made without 
departing from the spirit of the invention. 


Claims 

1. A method for achieving multilevel inclusion in a 
computer system having a microprocessor, a sys- 
tem bus, a first level set associative cache mem- 
ory including a first number of ways t a first level 
cache controller, a second level set associative 
cache including a number of ways equal to or gre- 
ater than the first number of ways of the first level 
cache, wherein each of the ways in the first levd 
cache corresponds to at least one way in the sec- 
ond level cache, a second levet cache controller, 
means coupled to the second level cache control- 
ler for setting and clearing an inclusion bi on data 
inside the second level cache, means coupted to 
the first and second level cache controllers for 
communicating and transmitting data between 
the first level and second levd caches, a bus 
master device, and random access memory, the 
method comprising: 

the first level cache controller communi- 
cating to the second level cache controller the 
particular first level cache way in which a copy of 
data received from the random access memory is 
placed on a first level and second level cache 
read miss; 

the second level cache controller placing 
the copy of data received from the random access 
memory in the second level cache way corre- 
sponding to the first level cache way cemmunh 
cated by the first level cache controller on the first 
level and second level cache read miss; 

the second level cache controller com- 
municating to the first lever cache controller the 
particular second level cache way where a copy • 
of data is located on-a first level cache read miss 
and second level cache read hit 

the first level cache controller placing the 
copy of data transmitted from the second level 
cache controller to the processor in the corre- 
sponding first level cache way; and 

the second level cache controller setting 
an inclusion bit on the second level cache loca- 
tion of the copy of data and clearing inclusion bits 
on any other second level cache locations that 
correspond to the first level cache location where 
the first level cache controller placed the copy of 
data. 
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2. The method of daim 1, wherein the first level 
cache controller indudes a replacement 
algorithm that determines which first level cache 
way in which to place a received copy of data, the 
step of the first level cache controfler copying the s 
data Into the first level cache way corresponding 

to the second level cache way induding: 

the first level cache controller disregarding 
its replacement algorithm on first level cache read 
miss and second level cache read hit cases. to 

3. The method of daim 1 , further comprising: 

the first level cache controller communi- 
cating to the second level cache controller the 
particular first level cache way in which a copy of 13 
received data is placed on a processor write; and 

the second level cache controller pladng 
the copy of received data in the second level 
cache way corresponding to the first level cache 
way communicated by the first level cache con- 20 
trdlen 

4 The method of daim 1, wherein greater than one 
way in the first level cache cannot correspond to 
than one cache way In the second level cache can 25 
correspond to one way in the first level cache. 

5, The method of dafcm 1 , further comprising: 

the second level cache controller snooping 
the system bus when the processor does not 30 
have control of the system bus to determine if the 
bus master device is writing to a cached memory 
location; 

the second level cache controller checking 
the indusion bit on a second level cache location 35 
where a second level cache write snoop hit 
occurs to determine if a copy of data from the ran- 
dom access memory location being written to 
resides in the first level cache; aid 

the second level cache controller directing 40 
the first level cache controller to snoop the system 
bus if said indusion bit is set 

6. The method of daim 5, wherein the second level 
cache is a write-back cache, the method further 45 
comprising: 

the second level cache controller snooping 
the system bus when the processor does not 
have control of the system bus to determine if the 
bus master device is reading a cached memory 50 
location; 

the second level cache controller deter- 
mining if the second level cache has an updated 
version of the data residing in the requested 
memory location on a second level cache read 35 
snoop hit; 

the second level cache controller providing 
the requested data to main memory if the second 

12 


level cache has an updated version of the data; 
and 

the bus controller reading the requested 
data provided by the second level cache control- 
ler: 

7. An apparatus for achieving multilevel indusion in 
a computer system, comprising: 
a system bus; 

a microprocessor coupled to said system 

bus; 

a first level cache memory coupled to said 
microprocessor and induding a first number of 
ways; 

a first level cache controller coupled to 
said first level cache, said microprocessor and 
said system bus and induding an output for trans- 
mitting way information and an input for receiving 
way Information; 

a second level cache of a size greater than 
or equal to the size of the ftst level cache which 
includes a number of ways equal to or greater 
than the first number of ways of the first IteveJ 
cachs f wherein each of the ways in the first level 
cache corresponds to at least one way in the seo- 
ondlevel cache and which indudes indusion 
information Indicating presence of data in the sec- 
ond level cache that is duplicated in the first level 
cache; 

a second level cache controller coupled to 
said system bus, said second level cache, said 
microprocessor, and said first level cache control- 
ler and including an input coupled to said first 
level cache controller way information output for 
receiving way information and an output coupled 
to said first level cache controller way information 
Input for transmitting way information; and 

random access memory coupled to said 
system bus; 

wherein on a first and second level cache 
read miss said first level cache controller trans- 
mits way information to said second level cache 
controller and said second level cache controller 
places received data in a way of the second level 
cache corresponding to the received way infor- 
mation, 

wherein on a first level cache read miss 
and a second level cache read hit said second 
level cache controller transmits way information 
to said frst level cache controller and said first 
level cache controller places received data in a 
way of the first level cache corresponding to the 
t received way information, and 

wherein sad second level cache controller 
sets the indusion bit in the second level cache 
location which contains the data placed in the first 
level cache and dears the indusion bits of any 
other second level cache locations which corre- 
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spond to the first level cache location where the 
data was placed. 

8. The apparatus of daim 7, wherein said first level 
cache controller includes a replacement means 5 
that determines which first level cache way in 
which to place a received copy of data, wherein 
said first level cache controller disregards said 
replacement means on first level cache read miss 

and second level cache read hits cases. 10 

9. The apparatus of daim 7, wherein greater than 
one way in the first level cache cannot correspond 
to one cache way In the second level cache and 
greater than one way in the second level cache is 
can correspond to one way in the first level cache. 

10. The apparatus of daim 7, wherein on a processor 
write said first level cache controller trenamfts 

way information to said second level cache corv 20 
trailer and said second level cache controller 
places received data in a way of the second level 
cache corresponding to the received way infor- 
mation* 

28 

11. The apparatus of darn 7, further comprising: 

a bus master device coupled to said sye» 
tern bus; and 

wherein said first level cache controller 
indudes means for snooping the system bus 30 
when said microprocessor does not have control 
of said system bus to determine if the bus master 
device is writing to a random access memory 
location that is cached in the first level cache, and 

wherein said second ievet cache controller 35 
further indudes: 

means for snooping the system bus when 
said microprocessor does not have control of said 
system bus to determine if the bus master device 
is writing to a random access memory location 40 
, that is cached in the second level cache; 

means for checking the Indusion bit on a 
second level cache location where a second level 
cache write snoop hit occurs to determine if a 
copy of data from said random access memory 43 
location being written to also resides in said first 
level cache; and 

means coupled to said first level cache 
controller which directs said first level cache con- 
troller to snoop the system bus if said Indusion bit 50 
is set 


system bus to determine if the bus master device 
is reading a random access memory location that 
is cached in the second level cache; 

means for determining whether the second 
level cache indudes an updated version of the 
data residing in the requested memory location 
when a second level cache read snoop hit occurs; 
and 

means for providing the requested data to 
main memory If the second level cache has an 
updated version of the data, wherein the bus con- 
troller reads the requested data provided by the 
second level cache controller. 


12. The apparatus of claim 11, further comprising: 
said second level cache being a write-back 

cache, wherein said second level cache control- 55 

ler further indudes: 

means for snooping the system bus when 

said microprocessor does not have control of said 
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► A method for achieving multflevei inclusion in 
a computer system with first and second level 
caches. The caches align themselves on a 
"way" basis by their respective cache control- 
lers communicating with each other as to which 
blocks of data they are replacing and which of 
their cache ways are being filed with data. On 
first and second level cache read misses the 
first level cache controller provides way infor- 
mation to the second level cache controller to 
allow received data to be placed in the same 
way. On first level cache reed misses and sec- 
ond level cache read hits, the second level 
cache controller provides way information the 
first level cache controller, which ignores its 
replacement indication and places data in the 
indicated way. On processor writes the first 
level cache controller caches the writes and 
provides the way information to the second 
level cache controller which also caches the 
writes and uses the way information to select 
the proper way for data storage. An inclusion bit 
is set on data in the second level cache that is 
duplicated in the first level cache. Multilevel 
inclusion allows the second level cache control- 
ler to perform the principal snooping respon- 
sibiities fof^both caches, thereby enabling the 
first level cache controller to avoid snooping 
dutiee until a first level cache snoop hit occurs. 
On a second level cache snoop hit the 3eoond 
level cache controller checks the respective 
inclusion bit to determine if a copy of this data 
also resides in the first level cache. The first 
level cache controller is directed to snoop the 
bus only if the respective inclusion bit is set 
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